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Abstract —We study the amount of randomness needed for an 
input process to approximate a given output distribution of a 
channel in the E-y distance. A general one-shot achievablllty 
bound for the precision of such an approximation is developed. In 
the i.i.d. setting where 7 = exp(ni?), a (nonnegative) randomness 
rate above infQu:D(Qx||,rx)<£ {D(Qx||7rx) + 7(Qu,gx|u)-S} is 
necessary and sufficient to asymptotically approximate the output 
distribution tt®" using the channel Q®y’ where Qu —Qx u 
Qx. The new resolvability result is then used to derive a one- 
shot upper bound on the error probability in the rate distortion 
problem; and a lower bound on the size of the eavesdropper list 
to include the actual message in the wiretap channel problem. 
Both bounds are asymptotically tight in i.i.d. settings. 

1. Introduction 

Approximation of a target output distribution with a given 
channel has proved to be the key technical step in the 
solution of many problems in information theory. In 1975 
Wyner first studied such an approximation task to establish 
the achievability part for Wyner’s common information [1], 
where he used the normalized relative entropy to quantify 
the distance between the synthesized output distribution and 
the target distribution. Later Han and Verdu coined the term 
resolvability for the minimum rate of the randomness needed 
for the input [2]. Motivated by the strong converse of the 
identification coding theorem, [3] considered resolvability in 
the total variation distance (TV), as well as relative entropy. 
The achievability part of resolvability (also known as the soft- 
covering lemma [4]) is particularly useful, e.g. in secrecy 
[5][ 6 ][7], channel synthesis [4] and lossless and lossy source 
coding [2] [ 8 ] [9]. Under both the normalized relative entropy 
measure and TV, the resolvability can be shown to be the 
minimum mutual information over all input distributions in¬ 
ducing the target output distribution, and this is also known 
to be true for unnormalized relative entropy as well (see for 
example [3]). 

In this paper we propose two new measures for approxi¬ 
mation of output statistics. The first one, excess information, 
gives a straightforward upper bound on the second metric', the 
E,y metric. The metric was, to our knowledge, originally 
introduced in [10] to simplify the formula of the DT bound 
therein. The latter metric has clear operational significance and 
reduces to the TV in the special case of 7 = 1, whereas the 
former is easier to upperbound. Asymptotically, however, the 

'Here “metric” or “distance” are used informally since they do not satisfy 
either symmehy or the Mangle inequality. 


two metrics behave in the same way. We derive a one-shot 
upperbound on the first (hence also the second) metric in the 
resolvability problem. Bounding the new metrics requires more 
care than the traditional TV to achieve asymptotic tightness. 

Particularly interesting is the case where the channel is sta¬ 
tionary memoryless and 7 grows exponentially as the number 
of channel uses tends to infinity. In this case a single letter 
formula of the rate of randomness needed to approximate a 
tensor power output distribution in E^ can be obtained from 
the aforementioned one-shot bound. Here a peculiar feature 
of approximation in E^ emerges: the distribution of each 
codeword in the generation of the random codebook need 
not induce the target output distribution through the stationary 
memoryless channel, and in fact the optimal choice of such 
a distribution (in the sense of requiring the minimum rate of 
randomness) generally does not induce the target distribution. 
This is in stark contrast to the case of TV measure, where the 
codeword distribution must induce the target distribution to 
ensure that the total variation between the output distribution 
and the target distribution does not converge to its maximum 
value, 2 , asymptotically. 

Two applications of the new channel resolvability results 
are presented. First, the simplest application to lossy source 
coding yields a new achievability bound on the probability 
that the distortion lies below a certain number, which in the 
asymptotic setting recovers the exponent of this probability 
previously obtained using the method of types (c.f. [11]). The 
advantage of the new derivation is its applicability beyond the 
discrete memoryless framework. 

The second application is in the achievability part of wiretap 
channels, where we propose a novel interpretation of secrecy 
in terms of the eavesdropper’s ability to perform list decoding. 
In contrast to the previous proofs for wiretap channels using 
TV-resolvability [ 6 ] [7] which only applies when the rate is 
below the perfect secrecy capacity, the new resolvability in E^ 
yields lower bounds on the required size of the eavesdropper 
list for all possible rates. This interpretation of security in 
terms of list size is reminiscent of equivocation [ 12 ], and 
indeed we obtain the same formula in the asymptotic setting, 
even though it is not immediate to prove a correspondence 
between the two. We also consider the case where the eaves¬ 
dropper wishes to detect that no message is sent with high 
probability. This is a practical setup because “no message” 
may be a special piece of information which the eavesdropper 
wants to know with high certainty. We obtain single letter 


expressions of the tradeoff between the transmission rate, 
eavesdropper list, and the exponent of the probability that the 
eavesdropper fails to detect non-message. Those bounds are 
asymptotically tight for random codes. 

II. Preliminaries 
A. Excess Information Metric 

One natural measure of the discrepancy between two dis¬ 
tributions P and Q on the same alphabet may be called the 
excess information metric with threshold 7 : 

IP’[*p||q(^) > log7] (1) 

where X ^ P and 

dP 

*P||Q(a;) := log—(a;). ( 2 ) 


2 ) If PxPy\x tind QxQy\x tire joint distributions on X x 
y, then 

E^iPxWQx) < E^{PxPy\x\\QxQy\x) ( 8 ) 

where equality holds when Py\x = Qy\x- In the latter 
case we obtain the data processing inequality: 

E^{Py\\Qy)<E^{Px\\Qx) (9) 

3) Given Px, Py\x tind Qy\X’ define 

E~^{Py\x\\Qy\x\Px) := ]E[P-y(Py|jf(-| 2 f)||( 5 y|jf(-| 2 f))] 

( 10 ) 

where the expectation is w.r.t. X ~ Px- Then 

E^{PxPy\x\\PxQy\x) = Ej{Py\x\\Qy\x\Px)- 

( 11 ) 


Notice that in additional to being more suitable for a one- 
shot approach, ( 1 ) provides richer information than the relative 
entropy measure since 


D{P\\Q) = / 1P’[*p||q(^) > ^dr 

J [ 0 ,+oo) 


[ (1-P[zp||q(X) >r])dT. (3) 

J ( — 00,0] 


We note that the excess information metric does not satisfy 
a data processing property. More precisely, suppose Px 
Py\x -t Py, Qx Py\x Qy, then it is not always true 
that 


^bPxWQxW >r]< ^IPrWQriy) > t] (4) 
where {X,Y) ~ Pxy- 

B. The E,~f{P\\Q) Metric 

Next we consider another metric which does satisfy the data 
processing inequality and has a clearer operational meaning. 
Given probability distributions P, Q and a constant 7 > 1. 
define an /-divergence [13] 

E^{P\\Q) ■■= P[zp||q(X) > log7] - 7P[zp||Q(y) > log7] 

(5) 

where X ^ P and Y Q. This quantity was introduced 
in [10] to simplify the expression of DT bound. From the 
Neyman-Pearson lemma we have the alternative formula for 
the above quantity: 

E^iP\\Q) = max(P(A) - jQiA)), ( 6 ) 

which becomes half of the total variation distance (the £i 
distance) between P and Q when 7 = 1 . Some basic properties 
of E-y are in order: 

Proposition 1. 1) For any event A, 

Q[A)>-{P{A)-E^{P\\Q)). (7) 
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III. Achievability Bounds on Excess Information 

We present a one-shot information spectrum achievability 
bound for resolvability under the excess information metric, 
which then automatically implies a bound under the E-y metric. 
Consider the setting of Figure 1. The input to the channel 
Qx\u Is equiprobably selected from a codebook (ci)ffy € . 

It turns out that codewords are i.i.d. codewords are usually 
good enough, and the expected distance from the synthesized 
distribution Px(c) to the target distribution ttx under the 
excess information metric is gauged as follows: 



Ttx 


Figure 1: Synthesizing a target distribution Ttx using a random 
number generator and a codebook 


Theorem 2 . Fix Ttx tind Qux = QuQx\u- c = 
[ci,..., Cl] be i.i.d. according to Qx- Define 


Pxic) ■— 


( 12 ) 


1=1 


Then for any r, 7 , e, cr > 0 satisfying 7 > e+a and 0 < 5 < 1, 
it holds that 


dP 


X(c) 


dttx 


(X)>7 


< 


-f 


dQx 
dTtx 

<IQx\u 


(X) > 7 — (T — e 
{X\U)> 5La 


dTtx 

exp(r )(7 — a — e)^ 
L(1 - S)^a^ 

7 — cr — e 


PIiu;x(U;X)>t] 


(13) 


where conditioned on c, X ^ Px(c)> ond {U, X) ~ QuQx\u- 





Remark 3. By setting r t- oo and letting 5 f the bound in 

Theorem 2 can be weakened in the following slightly simpler 
form: 


dPx 

Attx 


W>7 


< 


+ 1 


dxx 
^Qx\u 


d-TTX 


(2f) > 7 — CT — e 
{X\U) > La 


7 — CT — e 


(14) 


e 

The weakened bound (14) is still asymptotically tight provided 
that the exponent with which the threshold 7 grows is positive; 
see Corollary 4 below. However, when the exponent is zero 
(corresponding to the total variation case), we do need t in 
the bound for asymptotic tightness. 


The proof of Theorem 2 is omitted due to space limitations. 
Next we particularize Theorem 2 to the case of stationary 
memoryless channels and an exponentially growing threshold 
7 , to obtain explicit single-letter formula for the tradeoff 
between R and the exponent of 7 : 


Corollary 4. Fix per-letter distributions ttx and Qux = 
(5uQx|u- c = [ci,...,cl] be Lid. according to Q®". 
Define 


A := Uf=i TQx|u.-5(c/). where Tq^|u, 5 (q) denotes the Qx|u- 
typical sequences given c/. Then it can be shown that when 
E is less than the right hand side of (17), it holds that 
(-Px"(c) — lQx^){A) ^ 1 for some 5 > 0 . 

IV. Application to Lossy Source Coding 
The simplest application of the new resolvability result is to 
derive a one-shot achievability bound for source coding, which 
is most fitting in the regime of low rate and exponentially 
decreasing success probability. The method is applicable to 
general sources. In the special case of i.i.d. sources, it recovers 
the “success exponent” in lossy source coding originally 
derived by the method of types [13] for discrete memoryless 
sources. 

Theorem 5. Consider a source with distribution xx and a 
distortion function on U x X. For any distribution 

QuQx\U’ 7 > 1, ti > 0 and integer L, there exists a stochastic 
encoder 'Kij\x such that the size of the support of ttu is at most 
L and 

F[diU, X)<d]>- (P[d(C/, X)<d]- e) (22) 
7 

where (U,X) ~ 'Xux, iU,X) ~ Qux, and e is an upper- 
bound on the right hand side of (13). 


Px^{c) Y ^ 


|U"=cr 


(15) 


1=1 


Suppose 7 = exp(nE) and L = exp(ni?). Then 

'dPx 

r I -; 

n—foo '■ ' ' ' • V / ' • /V ' •> Yi—^oo 

= 0 


dTrf” 


^(X”) > 7 


(16) 


provided that 


Proof: Given a codebook (ci,..., c^) CU, let Pu be the 
equiprobable distribution on (ci,..., Ci) and set 

Pux ■= Qx\uPu- (23) 

The likelihood encoder is then defined as a random transfor¬ 
mation 

T^mx ■= Pu\x (24) 

so that the joint distribution of the codeword selected and the 
source realization X is 


E > p(gxikx) + [/(gu,gxiu) - rV. (17) 

where conditioned on c, the vector X" ~ Px"(c)- Moreover, 
the bound in (17) is tight. 

Proof of Achievability: Choose E' such that 


E>E'> P(Qxlkx) + [/(Qu,gx|u) - P] + . (18) 

Set 5 = i, 7 = exp(nP), L = exp(nP), e = exp(nP) — 
exp(nP') and a = 5(7 — e) = 5 exp(nP'), and apply (14). 
Notice that 


E 


dQxic/ 

dTTjf 


{X\U) 


= n[7(gu,gxiu) + p(gxikx)] 


(19) 


where [X, P) ~ Qx^u ■ large numbers, the first 

and second terms in (14) vanish because 


P(gx|kx)<P'; (20) 

d(gu,gxiu) + p(gxikx) <p' + p (21) 

are satisfied. ■ 

The basic idea for the proof of the tightness of (17) 

(the converse) is as follows: given a codebook c define 


T^ux = TrxPu\x (25) 

From Proposition 1 we obtain 

y¥[d{U,X) < d] 

> P[(i(C/ ,X)<d]— E^{Pxu\\'^xu) 

= F[diU,X) <d]- E^iPxWnx) (26) 

where {U,X) ~ Pux, which yields 

7EcP[d(P, X) < d] 

>F[diU,X) <d]-E,E-,iPx\\Trx) (27) 

where in (27) we used the fact that E^Pux = Qux- Finally 
we can choose a codebook such that F[d{U,X) < d] is at 

least its expectation. ■ 

Remark 6 . In the i.i.d. setting, let R{Trx,d) be the rate- 
distortion function when the source has per-letter distribution 
TTx. The distortion function for the block is derived from the 
per-letter distortion by 

n 

dA\u^,x^) := - d{ui,Xi). 


(28) 



Let (X”, U”) be the source-reconstruction pair distributed 
according to 7rx»»u"- If 0 < i? < R{Trx,d), the maximal 
probability that the distortion does not exceed d converges 
to zero with the exponent 


where 


lim — log 

n—¥oo Ji 


1 

P[fi(")(0",X") < d] 


G{R,d) 


(29) 


G{R,d) := mm[D{Q\\P) + [R{Q,d) - R]+]. (30) 

Q 

A weaker achievability result than (30) was proved in [14, 
pl68], whereas the final form (30) is given in [11, pl58, 
Ex6] based on method of types. Here we can easily prove the 
achievability part of (30) using Theorem 5 and Corollary 4 by 
setting Qx to be the minimizer of (30) and (5u|x to be such 
that 


Ed(U,X)<d, (31) 

IiQx,Qx\u)<R- (32) 

Then 7 „ = ex-p{nE) with 

E> D{Qx\\7Tx) + [IiU;X)Q-R]+, (33) 

ensures that 

(0”, X”) <d]>^ exp(-nE) (34) 

for n large enough, by the law of large numbers. 

Remark 1. Since the E^ metric reduces to TV when 7=1, 
Theorem 5 generalizes the likelihood source encoder based on 
the standard soft-covering/resolvability lemma [8]. In [8], the 
error exponent for the likelihood source encoder at rates above 
the rate-distortion function is analyzed using the exponential 
decay of TV in the approximation of output statistics, and 
the exponent does not match the optimal exponent in [13]. 
It is also possible to upperbound the success exponent of the 
TV-based likelihood encoder at rates below the rate-distortion 
function by analyzing the exponential convergence to 2 of TV 
in the approximation of output statistics; however that does 
not yield the optimal exponent (30) either. The power of E^- 
resolvability lies in the ability to convert a large deviation 
analysis into an excercise of the law of large numbers, that 
is, we only care about whether E,y converges to 0, but not the 
speed, even when dealing with error exponent problems. 

V. Application to Wiretap Channels 

Next we apply the L^-^-resolvability to the wiretap chan¬ 
nel Pyz\x depicted in Figure 2. The receiver and the 
eavesdropper observe y G y and z G Z, respectively. Given 
a codebook (c^,;), the input to the channel is c^i where 
r(;€{l,...,M}is the message to be sent and I is equiprob- 
ably chosen from {1,... ,L} to randomize the eavesdropper’s 
observation. Moreover, the eavesdropper’s observation has the 
distribution ttz when no message is sent. For general wiretap 
channels the performance may be enhanced by appending a 
conditioning channel Qx\u output of the encoder [6]. 

But in that case the same analysis can be carried out for the 


new wiretap channel Qyz\u- Thus the model in Figure 2 
entails no loss of generality. We need the following definitions 



Figure 2: The wiretap channel 


to quantify the eavesdropper’s knowledge. 

Definition 8. For a fixed codebook we say the eavesdropper 
can perform (A, T, e)-decoding if when no message is sent, 
it detects no message with probability at least 1 — A~^; and 
when a message m is sent, it can produce a list of T messages 
containing m with probability at least Cm such that 

1 ^ 

( 35 ) 

m—1 

For stationary memoryless channels, the quantities Pzy\x^ 
M and L in Figure 2 are identified as PznY"|X "5 exp(ni?) and 

exp(ni?L)- 

We consider an (M, L, Qx)-random code, which is defined 
as the ensemble of the codebook {cwi), w G M}, 

I G {1,..., L} where each codeword is i.i.d. chosen according 
to Qx- The following definition captures the asymptotic 
performance of the eavesdropper: 

Definition 9. Fix {R, R\_). The rate pair (a, r) is e-achievable 
by the eavesdropper if there exist sequences {An} and {T„} 
with 

lim - log T„ = T (36) 

n—¥oo Tl 

lim — log An = a (37) 

n—¥oo Tl 

such that for sufficiently large n, the eavesdropper can achieve 
{An,Tn, e)-decoding with high probability when the codebook 
is the (exp(ni?), exp(ni?L)5 code. 

Then we have the following result: 

Theorem 10. For any Qx, R, R\_ and 0 < e < I, the pair 
(a, r) is e-achievable by the eavesdropper in the sense of 
Definition 9 ijf 

I a<D(QzH7rz) + lI(Qx,Pzix)-R-RL]+ 

{ t>R-II(Qx,Pzix)-RlJ+ ^ ^ 

where Qx -G Pz|x Qz- 

Remark 11. From the noisy channel coding theorem, the 
supremum randomization rate Ri such that the sender can 
reliably transmit messages at the rate R is I{Qx, Py\x) — R- 
The larger R\_ the less reliably the eavesdropper can decode, 
so the optimal encoder chooses i?L as close to this supremum 
as possible. Thus Theorem 10 implies that to reliably transmit 







messages at the rate R, codebooks can be selected such that the 
eavesdropper cannot perform (exp(nQ;), exp(nT), e) for large 
n if there exists some Qx such that 

a > ^(Qzlkz) + [/(Qx,Pz|x) -/(Qx,Py|x)]+ (39) 

or 

t<R- [/(Qx, Pz\x) - HQx, Py\x) + R]+- (40) 

Remark 12. In general the sender-receiver want to minimize 
a and maximize t obeying the tradeoff (39), (40) by selecting 
Qx- In the special case where a has no importance and R is 
larger than the secrecy capacity C := supq^{/((5x,-Py|x) ~ 
I(Qx, Pzix)}^ we see from (40) that the supremum t is C. 
The formula is the same as the equivocation measure defined 
as ii7(kk|Z") [ 12 ], but technically our result does not follow 
directly from the lower bound on equivocation, since it may be 
possible that the a posterior distribution of W is concentrated 
on a small list but has a tail spread over an exponentially large 
set, resulting a large equivocation. 

The (eavesdropper) achievability part of Theorem 10 fol¬ 
lows by analyzing the eavesdropper decoding ability for dif¬ 
ferent cases of the rates (R, Rl). The (eavesdropper) converse 
part of Theorem 10 follows by applying the following non- 
asymptotic bounds to different cases of (R, Rl) and invoking 
Corollary 4. 

Theorem 13. In the wiretap channel, fix an arbitrary distri¬ 
bution pz ond a measurable subset 2?o ^ Suppose the 
eavesdropper can either detect that no message is sent upon 
observing z € Vo with 

Rz{Vo)>1-A-^ (41) 

or outputs a list ofT{z) messages upon observing z ^ Vo that 
contains the actual message m G {1, ..., M} with probability 
at least 1 — Cm- Define the average quantities 

^ '■= [ T{z)dpz{z), (42) 

M 

m—1 

Then, 

^ (44) 

where we recall that ttz is the non-message distribution, and 

T 1 ( 1 \ 

^ > - [^1 -e - ^ Y.^E-,^Pz\w=ra\\Rz)j . (45) 

From the eavesdropper viewpoint, a larger A and a smaller 
T is more desirable since it will then be able to find out that no 


message is sent with smaller error probability or narrow down 
to a smaller list when a message is sent. This observation 
agrees with (44) and (45): a smaller 7 implies a higher degree 
of approximation, and hence higher indistinguishability of 
output distributions which is to the eavesdropper disadvantage. 
VI. Discussion 

As we have demonstrated, the achievability part of resolv¬ 
ability in has various applications in information theory, 
especially for bounding rare event probabilities, (c.f. (22)(44) 
and (45)). However the asymmetry of (when 7 > 1 ) places 
a limitation on £^-,-resolvability in certain problems. In partic¬ 
ular, there is no counterpart of Theorem 2 for E-^{'Kx\\Px)- 
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